Skip to content

Conversation

iabyn
Copy link
Contributor

@iabyn iabyn commented Oct 2, 2025

Rewrite perlxs.pod

This branch completely rewrites and modernises the XS reference manual,
perlxs.pod.

The new file is about twice the size of the old one.

This branch:

  • deletes some obsolete sections;

  • reorders the existing sections into a more logical order;

  • adds a large new introductory/overview part, which explains
    all the background needed to understand what XSUBs do, including
    SVs, the stack, reference counts, magic etc.

  • includes a BNF syntax section

  • modernises: e.g. it uses "ANSI" parameter syntax throughout

  • has a fully-worked example using T_PTROBJ

Note that although each commit in this branch may have a complex-looking
diff for the updating of a particular section, in reality most sections
haver been rewritten from scratch, and the diff output is showing
paragraph breaks as fixed unchanging points, so that it appears as lots of
individual paragraph changes rather than "delete all this text, add new
text". If reviewing, it may be easier to just read the final perlxs.pod
file instead of looking at the diffs.

  • This set of changes requires a perldelta entry, and I will write one later

iabyn added 30 commits October 2, 2025 12:03
The XS parser supported an extremely obscure bit of functionality which
made use of the %v package variable to maintain state between different
bits of typemap processing. This was accidentally broken in 5.10.0:
refactoring removed the 'use vars "%v"' line, and no one seemed to
notice or care.

Also, the sole example of its use in the docs seemed to be obscure,
confusing and probably wrong.

There was a consensus in the discussion at

    http://nntp.perl.org/group/perl.perl5.porters/267667

that we should stop documenting this feature rather than trying to fix
it.
The various XS code examples had odd and inconsistent indentation (often
with 5 leading spaces) and inconsistent formatting, e.g. foo(a,b) vs
foo( a, b ) vs foo(a, b). Fix that, and also remove any tab chars.

Whitespace-only change.
This commit is a simple cut which deletes several '=head2' sections from
perlxs.pod. The next commit will tidy up and fix any broken links etc.

These sections are more tutorial-like, and aren't in line with the goal
of this branch that perlxs.pod becomes purely a reference manual for XS.
Any relevant information from these sections may be incorporated later
into new sections in perlxs.pod and/or be included in a future rewrite
of perlxstut.pod.

The sections deleted are:

    =head2 Introduction
    =head2 On The Road
    =head2 The Anatomy of an XSUB
    =head2 The Argument Stack
    =head2 The RETVAL Variable
    =head2 Returning SVs, AVs and HVs through RETVAL
    =head2 Returning Undef And Empty Lists
    =head2 Interface Strategy
    =head2 Perl Objects And C Structures
The previous commit deleted several sections from perlxs.pod. This
commit fixes things up; done as a separate commit so that the changes
aren't drowned out in the diff listing.
This big commit does a series of plain cut+pastes to reorder all the
=head2 sections within the file.

This changes the order from semi-random into roughly the order the
various XS keywords would appear within an XS file, and then within an
XSUB declaration/definition.

No changes have been made to the text: simply that all lines from a
particular '^=head2' up until the next head2 have been cut+paste as
a single unit.

No attempt has been made yet to make the text consistent with the new
ordering; that will be done by the subsequent commits of this branch.

The previous ordering in this file was:

    =head1 NAME
    =head1 DESCRIPTION
    =head2 The MODULE Keyword
    =head2 The PACKAGE Keyword
    =head2 The PREFIX Keyword
    =head2 The OUTPUT: Keyword
    =head2 The NO_OUTPUT Keyword
    =head2 The CODE: Keyword
    =head2 The INIT: Keyword
    =head2 The NO_INIT Keyword
    =head2 The TYPEMAP: Keyword
    =head2 Initializing Function Parameters
    =head2 Default Parameter Values
    =head2 The PREINIT: Keyword
    =head2 The SCOPE: Keyword
    =head2 The INPUT: Keyword
    =head2 The IN/OUTLIST/IN_OUTLIST/OUT/IN_OUT Keywords
    =head2 The C<length(NAME)> Keyword
    =head2 Variable-length Parameter Lists
    =head2 The C_ARGS: Keyword
    =head2 The PPCODE: Keyword
    =head2 The REQUIRE: Keyword
    =head2 The CLEANUP: Keyword
    =head2 The POSTCALL: Keyword
    =head2 The BOOT: Keyword
    =head2 The VERSIONCHECK: Keyword
    =head2 The PROTOTYPES: Keyword
    =head2 The PROTOTYPE: Keyword
    =head2 The ALIAS: Keyword
    =head2 The OVERLOAD: Keyword
    =head2 The FALLBACK: Keyword
    =head2 The INTERFACE: Keyword
    =head2 The INTERFACE_MACRO: Keyword
    =head2 The INCLUDE: Keyword
    =head2 The INCLUDE_COMMAND: Keyword
    =head2 The CASE: Keyword
    =head2 The EXPORT_XSUB_SYMBOLS: Keyword
    =head2 The & Unary Operator
    =head2 Inserting POD, Comments and C Preprocessor Directives
    =head2 Using XS With C++
    =head2 Safely Storing Static Data in XS
    =head3 MY_CXT REFERENCE
    =head1 EXAMPLES
    =head1 CAVEATS
    =head2 Use of standard C library functions
    =head2 Event loops and control flow
    =head1 XS VERSION
    =head1 AUTHOR DIAGNOSTICS
    =head1 AUTHOR

and is now:

    =head1 NAME
    =head1 DESCRIPTION
    =head2 The MODULE Keyword
    =head2 The PACKAGE Keyword
    =head2 The PREFIX Keyword
    =head2 Inserting POD, Comments and C Preprocessor Directives
    =head2 The REQUIRE: Keyword
    =head2 The VERSIONCHECK: Keyword
    =head2 The PROTOTYPES: Keyword
    =head2 The EXPORT_XSUB_SYMBOLS: Keyword
    =head2 The INCLUDE: Keyword
    =head2 The INCLUDE_COMMAND: Keyword
    =head2 The TYPEMAP: Keyword
    =head2 The BOOT: Keyword
    =head2 The FALLBACK: Keyword
    =head2 The NO_OUTPUT Keyword
    =head2 The IN/OUTLIST/IN_OUTLIST/OUT/IN_OUT Keywords
    =head2 Default Parameter Values
    =head2 The C<length(NAME)> Keyword
    =head2 Variable-length Parameter Lists
    =head2 The PREINIT: Keyword
    =head2 The INPUT: Keyword
    =head2 The NO_INIT Keyword
    =head2 Initializing Function Parameters
    =head2 The & Unary Operator
    =head2 The SCOPE: Keyword
    =head2 The INIT: Keyword
    =head2 The C_ARGS: Keyword
    =head2 The CODE: Keyword
    =head2 The PPCODE: Keyword
    =head2 The POSTCALL: Keyword
    =head2 The OUTPUT: Keyword
    =head2 The CLEANUP: Keyword
    =head2 The PROTOTYPE: Keyword
    =head2 The OVERLOAD: Keyword
    =head2 The ALIAS: Keyword
    =head2 The INTERFACE: Keyword
    =head2 The INTERFACE_MACRO: Keyword
    =head2 The CASE: Keyword
    =head2 Using XS With C++
    =head2 Safely Storing Static Data in XS
    =head3 MY_CXT REFERENCE
    =head1 EXAMPLES
    =head1 CAVEATS
    =head2 Use of standard C library functions
    =head2 Event loops and control flow
    =head1 XS VERSION
    =head1 AUTHOR DIAGNOSTICS
    =head1 AUTHOR
Following the previous commit's reordering of the all the =head2
sections, demote most of the =head2 headers to =head3, and add some new
=head2 headers which group together related headers.

Also add some =head3's for a few missing keywords.

Subsequent commits will flesh out the new sections.
Four commits ago, I removed most of the general text sections in
perlxs (i.e. the ones not specifically about a particular keyword).

Now this commit adds a completely new introductory part to perlxs, about
1200 lines long. It represents an attempt to write a background to what
XS and XSUBs, SVs, typemaps etc are, in a complete and modern way.
The existing reference section for each keyword follows it.

I tried to avoid getting too tutorial-like (that's what perlxstut is
for), but I may have crossed the line in various places. In particular
it has a new section which could have been titled "all the bits of
perlguts you need to know in order to write non-trivial XSUBs without
having to actually read perlguts".
Add a section which semi-formally tries to define the syntax and
structue of an XS file, using a BNF-like format.

See http://nntp.perl.org/group/perl.perl5.porters/268701 for the
discussion of this part.
Rewrite the POD for these three keywords, and in particular, treat
them as one declaration, rather than three unrelated keywords.
Populate the new

     =head2 File-scoped XS Keywords and Directives

section, partially by cannibalising (and then deleting) the old

    =head3 Inserting POD, Comments and C Preprocessor Directives

subsection. This commit only adds text about directives; subsequent
commits will update the various file-scoped keywords.
Populate the new

    =head2 The Structure of an XSUB
    =head2 An XSUB Declaration

sections
Add some initial text for this new section, and also add a new
subsection "XSUB Parameter Placeholders".
Rewrite (and retitle) these three subsections:

    =head3 Default Parameter Values
    =head3 The C<length(NAME)> Keyword
    =head3 Variable-length Parameter Lists
Add text for the new '=head2 The XSUB Input Part' section, and rewrite
the existing entry for the PREINIT keyword.
This commit completely rewrites this section and subsections:

    =head3 The INPUT: Keyword
        =head4 The NO_INIT Keyword
        =head4 Initializing Function Parameters
        =head4 The & Unary Operator

It de-emphasises the INPUT keyword and suggests using ANSI XS signatures
etc instead.
Add text for the new '=head2 The XSUB Init Part' section, and rewrite
the existing entry for the INIT keyword.
Add text to the new

    =head2 The XSUB Code Part
    =head3 Auto-calling a C function

sections, and rewrite the existing

    =head4 The C_ARGS: Keyword

section
Rewrite these sections:

    =head3 The CODE: Keyword
    =head3 The PPCODE: Keyword
This keyword formerly wasn't documented. The docs now say "this is what
it is, but don't use it".
Add text to the new

    =head2 The XSUB Output Part

section, and rewrite the text in these existing sections:

    =head3 The POSTCALL: Keyword
    =head3 The OUTPUT: Keyword
Add text to the new

    =head2 The XSUB Cleanup Part

section, and rewrite the text in this existing section:

 =head3 The CLEANUP: Keyword
Add text to the new

    =head2 XSUB Generic Keywords

section, and rewrite the text in this existing section:

    =head3 The PROTOTYPE: Keyword
iabyn added 12 commits October 2, 2025 12:03
Explain that a 'C' parameter type in an XSUB declaration can actually
be a Perl package name or similar, e.g.

    Foo::Bar
    f(Foo::Bar obj, char *s)
First, add a new subsection

    =head3 T_PTROBJ and opaque handles

to the TYPEMAPs section explaining how this typemap can be used to
map between Perl objects and C library handles. It provides a
fully-worked example of wrapping a simple arithmetic library.

Then completely rewrite the

    =head3 The OVERLOAD: Keyword

section. In particular, it now refers to the new T_PTROBJ example and
shows how it can be extended to use overloading.
This keyword was undocumented, even though it had been added 25 years
ago.
Populate the introduction to this new section.
Rewrite this section:

    =head3 The ALIAS: Keyword
Rewrite these sections:

 =head3 The INTERFACE: Keyword
 =head3 The INTERFACE_MACRO: Keyword

also demote the second to be a head4 child of the first. Then expand
the T_PTROBJ example to use INTERFACE as an alternative to ALIAS.
Rewrite this section:

    =head3 The CASE: Keyword
Populate this new section (except for the T_PTROBJ subsection, which had
already been added by an earlier commit within this branch).

Note that the "Common typemaps" subsection could probably benefit
from some further expansion by someone familiar with which built-in
T_FOO entries are useful.
Rewrite this section:

 =head2 Using XS With C++

Disclaimer: I've never written a proper C++ program. I had to
(literally) dust off my 34-year old copy of Stroustrup(*) and also do
some Googling. Hopefully what I've written is sane.

(*) This was bought back in the days when people used to to learn things
by buying books, and when I thought that I ought to know something about
this newfangled C++ thing. I never got round to reading all of it: I
discovered Perl around the same time, which looked to be a lot more fun.
Revise the text in this section:

    =head2 Safely Storing Static Data in XS
Rewrite this section:

     =head1 EXAMPLES

Basically, delete the one big example in this section and instead
provide links to various other examples already present in this document
instead.
Tweak the final few sections of perlxs.pod.
Comment on lines +350 to +356
L<perlcall>: this describes how to call Perl functions and do the
equivalent of C<eval ""> from C.

=item *

L<perlembed>: this describes how to embed a complete Perl interpreter
within another application.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perlcall barely mentions eval_pv, perlembed provides much better documentation for eval_pv/eval_sv.


There is a standard system typemap file which contains rules for common C
and Perl types, but you can add your own typemap file in addition, and
from perl 5.16.0 onwards you can also add typemap declarations inline
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this refer to the version of EU::PXS that added embedded typemaps (3.01) instead of the perl version?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Practice throughout the document is inconsistent. There are numerous cases of “since 5.*” or “Perl 5.*”, but also some cases where both the Perl version and the ParseXS version are stated, e. g. in the sections SCOPE and ALIAS. And there’s at least one case where only the ParseXS version is stated (I forgot in which section).

I think trying to consistently state both versions might be useful: The ParseXS version because it’s technically the more relevant one here, and the Perl version because it’s more intuitive for many people.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think trying to consistently state both versions might be useful: The ParseXS version because it’s technically the more relevant one here, and the Perl version because it’s more intuitive for many people.

No, I believe adding Perl versions will only add confusion here.

Comment on lines +918 to +919
If you need to coerce an SV to a string (e.g. before modifying its string
value) then use C<SvPV_force()> or one of its variants. For example if
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps "before directly modifying its string buffer".

You don't need to force normal before calling sv_setpvn().

Copy link
Contributor

@johannessen johannessen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments, mostly about minor things.

Overall, I found the document to be well structured and the prose to be easily readable, in spite of the fairly complex topic.

in a particular C library, the XSUB definitions in the XS file are often
just a couple of lines, consisting of a declaration of the name,
parameters and return type. The XS parser will do almost all the heavy
lifting for you,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/you,/you./

Comment on lines +1547 to +1548
require XSLoader;
XSLoader::load(__PACKAGE__, $VERSION);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

XSLoader 0.14 (core since Perl 5.15) offers a simpler syntax:

XSLoader::load();

Given that the goal of this rewrite is to modernise the documentation, should this newer syntax be used in the example?

Comment on lines +1641 to +1643
XSUB to be declared without raising a "duplicate XSUB" warning. This
warning suppression only works for the if/else/endif form. For example
this works:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was, at first, confused by this text: From reading it, and the examples below it, it seemed like what doesn’t work is the #ifndef directive, specifically. Furthermore, the examples use “ifdef”, whereas this text uses “if”, which made it look like there’s a typo here.

After re-reading the old perlxs document, I understand presence of #else in particular is the key here. I think the new text isn’t very clear if you don’t already know that. How about something like this:

XSUB to be declared without raising a "duplicate XSUB" warning. This
warning suppression only works when an else branch is present. For
example, this works:

Comment on lines +1683 to +1688
REQUIRE: 3.58

The C<REQUIRE> keyword is used to indicate the minimum version of the
C<ExtUtils::ParseXS> XS compiler (and its F<xsubpp> wrapper) needed to
compile the XS module. It is expected to be a floating-point number of the
form C<\d+\.\d+/>. It is analogous to the perl C<use v5.xx>.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-form C<\d+\.\d+/>. It is analogous to the perl C<use v5.xx>.
+form C<\d+\.\d+/>. It is analogous to the perl C<use ExtUtils::ParseXS x.xx>.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the version regex has a trailing / but not a leading /.

Comment on lines +2016 to +2018
The name of the XSUB is usually put on the line following the type, in
which case it must be on column one. It is permissible for both the return
type and name to be on the same line.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very minor nit: I think those sentences would read better if the phrase “return type” was used first.

-The name of the XSUB is usually put on the line following the type, in
-which case it must be on column one. It is permissible for both the return
+The name of the XSUB is usually put on the line following the return type,
+in which case it must be on column one. It is permissible for both the
 type and name to be on the same line.

Comment on lines +3467 to +3469
OVERLOAD: \"\"

Parameters preceded by C<OUTLIST> keyword do not appear in the usage
signature of the generated Perl function.
This could be regarded as a bug.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does “bug” imply that it might get fixed, and the escaped syntax OVERLOAD: \"\" might become illegal in future?

This feels similar to an explicit OUTPUT: RETVAL being required for non-autocall usage, which is described as “probably a bad design decision, but we're stuck with it now” in the section on CODE above.

Should these two cases be described with similar language, or is there a real difference here with regards to possible future changes?

Comment on lines +3476 to +3479
object); and third, a swap flag. See L<overload> for the full details
of how these functions will be called, with what arguments. Note that
C<swap> can in fact be undef in addition to false, to indicate an assign
overload such as C<+=>.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To detect the difference between swap being false and undef, you’d need to declare it as SV* and use SvOK() and SvTRUE(), right? Should this be pointed out, or is it obvious enough?

Comment on lines +3748 to +3749
C<ALIAS> which is more suited for autocall. Note that C<ALIAS> should not
be used together with either of C<INTERFACE> or C<ATTRS>.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For symmetry with the section on OVERLOAD:

-be used together with either of C<INTERFACE> or C<ATTRS>.
+be used together with either of C<ATTRS>, C<INTERFACE>, or C<OVERLOAD>.

Comment on lines +4098 to +4099
are converted back and forth via C<(UV)> casts. A few unsigned types such
as C<I16> and C<U32> are instead mapped to C<T_U_SHORT> and C<T_U_LONG> XS
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/I16/U16/

Comment on lines +2140 to +2146
=head3 Fully-qualified type names and Perl objects

Foo::Bar
foo(Foo::Bar self, ...)

Normally the type of an XUB's parameter or return value is a valid C type,
such as C<"char *">. However you can also use Perl package names. When a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens when such a parameter receives a Perl object blessed to a different package? Is @ISA a factor at all?

I don’t know the answer, but feel like this should be addressed (briefly) somewhere in this document. For example the section The OVERLOAD: Keyword makes it sound like there is an automatic type check, but it remains unclear whether it's just for the svtype or if the package is considered in some way:

will croak with an Expected foo to be of type My::Num, got scalar error

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends on the typemap, the commonly used T_PTROBJ follows @ISA.

OVERLOAD just does use overload 'op' => \&xs_sub.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, of course. Yes. Thank you for reminding me.

I suggest adding a direct link for the T_PTROBJ documentation, for example like this:

@@ -4135,10 +4135,10 @@
 argument, until finally some sort of destroy function frees the handle and
 its data. The C<T_PTROBJ> typemap is one common method for mapping Perl
-objects to such C library handles. Behind the scenes, it uses blessed
-scalar objects with the scalar's integer value set to the address of the
-handle. The C<INPUT> code template of the C<T_PTROBJ> typemap retrieves the
-pointer from the scalar object referred to by a passed RV argument, while
-the C<OUTPUT> template creates a new blessed RV-to-SV with the handle
-address stored in it.
+objects to such C library handles; see L<perlxstypemap/T_PTROBJ>. Behind the
+scenes, it uses blessed scalar objects with the scalar's integer value set
+to the address of the handle. The C<INPUT> code template of the C<T_PTROBJ>
+typemap runs type checks and retrieves the pointer from the scalar object
+referred to by a passed RV argument, while the C<OUTPUT> template creates a
+new blessed RV-to-SV with the handle address stored in it.
 
 For the purposes of an example, we'll create here a minimal example C


short
baz(int a, char *b = "")
PREINIT:
Copy link
Contributor

@Leont Leont Oct 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we still use PREINIT? I see it's use-case in C89 for declarations, but in C99 it can be confusing: mainly because it runs before argument handling.

completely within the signature (i.e. which don't use an C<INPUT> section
to specify their type).

Perls before 5.36 used C89 compiler semantics, which didn't allow variable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is entirely true. We used C89 it for perl itself, but not necessarily for CPAN modules.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants